Multi-Objective X -Armed Bandits

نویسندگان

  • Kristof Van Moffaert
  • Kevin Van Vaerenbergh
  • Peter Vrancx
  • Ann Nowé
چکیده

Many of the standard optimization algorithms focus on optimizing a single, scalar feedback signal. However, real-life optimization problems often require a simultaneous optimization of more than one objective. In this paper, we propose a multi-objective extension to the standard X -armed bandit problem. As the feedback signal is now vector-valued, the goal of the agent is to sample actions in the Pareto dominating area of the objective space. Therefore, we propose the multi-objective Hierarchical Optimistic Optimization strategy that discretizes the continuous action space in relation to the Pareto optimal solutions obtained in the multi-objective objective space. We experimentally validate the approach on two well-known multi-objective test functions and a simulation of a real life application, the filling phase of a wet clutch. We demonstrate that the strategy allows to identify the Pareto front after just a few epochs and to sample accordingly. After learning, several multi-objective quality indicators indicate that the set of sampled solutions by the algorithm very closely approximates the Pareto front.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to fin...

متن کامل

Budgeted Bandit Problems with Continuous Random Costs

We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...

متن کامل

Modal Bandits

Analyses of multi-armed bandits primarily presume that the value of an arm is its expected reward. We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions.

متن کامل

Risk-Aversion in Multi-armed Bandits

Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off...

متن کامل

Multi-Armed Bandits for Addressing the Exploration/Exploitation Trade-off in Self Improving Learning Environment

This project proposes the use of machine learning techniques such as Multi-Armed Bandits to implement self-improving learning environments. The goal of a self-improving learning environment is to perform good pedagogical choices while measuring the efficiency of these choices. The modeling of students is done using the LFA model and fitted on a dataset of university courses to allow to simulate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014